Dynamic WHT rounds in TurboQuant#7330
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
| // Norms dtype must match the element ptype of the Vector, with the parent's nullability. | ||
| // Norms carry the validity of the entire TurboQuant array. | ||
| let element_ptype = vector_metadata.element_ptype(); | ||
| let expected_norms_dtype = DType::Primitive(element_ptype, dtype.nullability()); |
There was a problem hiding this comment.
Noooo, we want these to be same or wider. E.g., I think for f16, norms should be f32.
There was a problem hiding this comment.
I dont think we want that as that is an implicit cast. If you wanted to store f16 with f32 precision you should upcast it first (not inside this quantization encoding)
There was a problem hiding this comment.
Oh actually if multiplication overflows maybe there is an argument for upcasting? But then we might as well just always store norms as f64 because theoretically anything can overflow with enough large elements and lots of dimensions... So I would rather just keep the behavior as is
| .iter() | ||
| .map(|&v| if v != 0 { 0u32 } else { F32_SIGN_BIT }) | ||
| .collect(); | ||
| // The storage is in inverse application order: round k-1 first, then k-2, ..., 0. |
There was a problem hiding this comment.
Is there any point in keeping it in inverse order?
There was a problem hiding this comment.
No not really, that's just how the previous implementation did it.
Given that we need both orders I don't think it matters, I can change it in a followup PR
Summary
Tracking issue: #7297
Adds the ability to have a dynamic number of rounds of FWHT rounds for the SORF algorithm. Previously, this was just hardcoded to 3.
Also fixes a small validation bug.
Testing
Existing tests suffice, and then added some regression tests for the validation bug.